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® Method and apparatus for transmitting digital data in massively parallel systems 



® A massively parallel system has a self-timed 
interface (STI) in which a clock signal clocks bit 
serial data onto a parallel, electrically conductive bus 
and the clock signal is transmitted on a separate line 
of the bus. The received data on each line of the bus 
is individually phase aligned with the clock signal. 



Digital data is transmitted at high speeds via the 
parallel bus to provide a scalable communications 
network for parallel processing systems while elimi- 
nating precise bus length and system clock rates as 
a critical or limiting factor. 
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This invention relates to an improved method 
and apparatus for transmitting digital data at high 
speeds via a parallel data bus, and more particu- 
larly, to a method and apparatus to provide a cost 
effective, scalable communications network for par- 
allel processing systems while eliminating precise 
bus length and system clock rates as a critical or 
limiting factor in system design. 

The present United States patent application is 
related to the following co-pending United States 
patent applications incorporated herein by refer- 
ence: 

Application Serial No. 262,087, filed 06/17/94 (attor- 
ney Docket No. PO9-93-053), entitled "Digital 
Phase Locked Loop with Improved Edge Detector," 
and assigned to the assignee of this application. 

Application Serial No. 261,515, filed 06/17/94 
(attorney Docket No. PO9-93-054). entitled "Self- 
Timed Interface," and assigned to the assignee of 
this application. 

Application Serial No. 261,522, filed 06/17/94 
(attorney Docket No. PO9-93-056), entitled "Mul- 
tiple Processor Link." and assigned to the assignee 
of this application. 

Application Serial No. 261,561, filed 06/17/94 
(attorney Docket No. PO9-93-057), entitled "En- 
hanced Input-Output Element." and assigned to the 
assignee of this application. 

Application Serial No. 261,523, filed 06/17/94 
(attorney Docket No. PO9-93-059), entitled "At- 
tached Storage Media Link," and assigned to the 
assignee of this application. 

Application Serial No. 261,641, filed 06/17/94 
(attorney Docket No. PO9-93-060), entitled "Shared 
Channel Subsystem," and assigned to the assign- 
ee of this application. 

As will be appreciated by those skilled in the 
art, such factors as noise and loading limit the 
useful length of parallel busses operating at high 
data rates. In the prior art. the length of the bus 
must be taken into account in the system design 
and the bus length must be precisely as specified. 
Manufacturing tolerances associated' with physical 
communication link (chips, cables, cord wiring, 
connectors, etc.) and temperature and variations in 
power supply voltage also limit the data rates on 
prior art busses comprised of parallel conductors. 
Further, many prior art computer systems transfer 
data synchronously with respect to a processor 
clock, so that a change in processor clock rate may 
require a redesign of the data transfer bus. 

An increasingly popular means of providing low 
cost, high capacity compute capability is to couple 
a number of comput r resources tog ther via a 
high speed switch network. This allows them to 
communicate readily with each oth r to share work 
as well as to readily access syst m resources such 
as DASD, print servers, file servers, archival sys- 



tems, boot servers, etc., either directly or via gate- 
way nodes. Typically the number of such network 
connections scales at least linearly with the number 
of nodes and in many cases goes up geometri- 
5 cally. As a result, the link technology is a signifi- 
cant component of the total system in terms of 
cost, reliability, space, power, and can limit the 
communication subsystems 1 performance and 
hence the total machine's performance. 
w An object of this invention is the provision of a 
cost effective bus data transfer system that can 
operate at high data transfer rates without tight 
control of the bus length, and without system clock 
constraints; a system in which the maximum bus 
15 length is limited only by the attenuation loss in the 
bus. 

Another object of the invention is the provision 
of a general purpose, low cost, high performance, 
point to point data communication link where the 

20 width and speed of the interface can easily be 
modified to tailor it to specific bandwidth require- 
ments and to specific implementation technologies, 
including VLSI technologies. 

A further object of the invention is the provision 

25 of a bus data transfer system that operates a clock 
speed equal to the. data rate. 

A more specific object of the invention is the 
provision of a system that adjusts the phase or 
arrival time of the incoming data on the receive 

30 side so it can be optimally sampled by the local 
receive clock, compensating for many of the manu- 
facturing tolerances associated with the physical 
link (chip, cable, card wiring, connectors, etc.) as 
well as temperature changes and power supply 

35 output variations. 

A further object of the invention is the provision 
of a low cost, modular, high bandwidth, highly 
reliable interconnect for structuring moderately par- 
allel systems comprised of microprocessors as well 

40 as for parallel processing machines from just a few 
processing nodes to thousands of processing 
nodes. 

Still another object of the invention is the provi- 
sion of a semi-synchronous network linking togeth- 

45 er a number of processors. 

Briefly, this invention contemplates the provi- 
sion of a self-timed interface (STI) in which a clock 
signal clocks bit serial data onto a parallel, elec- 
trically conductive bus and the clock signal is 

so transmitted on a separate line of the bus. The 
received data on each line of the bus is individually 
phase aligned with the clock signal. The received 
clock signal is used to define boundary edges of a 
data bit cell individually for each line and the data 

55 on each line of the bus is individually phase ad- 
justed so that, for example, a data transition posi- 
tion is in th center of th c II. At the data rates 
contemplated in the application of this invention, 
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th propagation delay is significant. However, with- 
in limits, the bus length is not critical and is in- 
dependent of the transmit and received system 
clock. The phase adjustment can compensate for a 
skew of up to several bit cells across the width of s 
the bus. The self-timed interface is used to link 
together a number of processors in a network that 
is readily scalable. 

The foregoing and other objects, aspects and 
advantages will be better understood from the fol- io 
lowing detailed description of a preferred embodi- 
ment of the invention with reference to the draw- 
ings, in which: 

Figure 1 is an overview block diagram illustrat- 
ing the application of a self-timed interface, in 75 
accordance with the teachings of this invention, 
to data communication among computer chips. 
Figure 2 is a block diagram illustrating one em- 
bodiment of a transmitter serializer for imple- 
menting a self-timed interface in accordance 20 
with this invention. 

Figure 3 is a block diagram illustrating byte 
synchronization in accordance with the inven- 
tion. 

Figure 4 is a block diagram illustrating the next 25 
step in the byte synchronization process. 
Figure 5 illustrates phase alignment and sam- 
pling logic in accordance with a preferred em- 
bodiment of the invention. 

Figure 6 is a block diagram of a 64 node switch- 30 
ing network employing the self timed interface 
technology in accordance with the teachings of 
this invention. 

Figure 7 is diagram similar to Figure 6 showing 
the scalability of the 64 node switching network 35 
in Figure 6 to a 128 node network. 
Figure 8 is a block diagram showing an inter- 
connecting system of nodes operating in a 
semi-synchronous manner employing a self 
timed interface in accordance with the teachings 40 
of this invention. 

Referring now to Figure 1 of the drawings, it 
illustrates one embodiment in which a self-timed 
interface in accordance with the teachings of this 
invention can be used. This exemplary embodi- 4s 
ment of the self-timed interface provides data com- 
munications between two microprocessor chips, 
labeled here as Chip A and Chip B. However, as 
will be apparent to those skilled in the art, the self- 
timed interface of this invention is applicable to so 
provide data transfer between a wide variety of 
components or nodes. 

Chip A has a transmit port labeled 12A and 
Chip B has a transmit port labeled 12B. Similarly, 
Chips A and B have receive ports labeled 14A and 55 
14B, respectively. The ports are connected by two , 
self-timed interface busses 16; one for ach trans- 
mission direction. In this exemplary embodiment of 



the invention, ach bus 16 is one byte wide, and 
comprised of nine electrical conductors; eight con- 
ductors for data and one conductor for a clock 
signal. 

Each transmit port (12A and 12B) includes a 
transmit logical macro 18 that provides a logical 
interface between the host logic and the self-timed 
interface link 16. Sync buffers 22 provide an inter- 
face between the host clock and the self-timed 
interface clock. This allows the self-timed interface 
link to run at a predetermined cycle time that is 
independent of the host clock, making the self- 
timed interface link independent of the host. An 
outbound physical macro 24 serializes a word-wide 
data flow into a byte-wide data flow that is transmit- 
ted along with the clock on the self-timed interface 
link 16. 

Each receive port (i.e., 14A and 14B) includes 
an inbound physical macro 26 that first dynamically 
aligns each data bit with the self-timed interface 
clock signal. It aligns any bits with skew up to three 
bit cells and deserializes the bytes into words. A 
receive logical macro 28 provides an interface be- 
tween the self-timed interface receiver logic and 
the host logic and generates link acknowledge sig- 
nals and link reject signals, which are coupled by 
internal links 33 and transmitted back to the trans- 
mitting port via an outbound self-timed interface 
link 16. In order to compensate for variations in 
electrical path delay, the phase of the incoming 
data is adjusted, or self-timed. Each bit (line) is 
individually phase aligned to the transmitted refer- 
ence clock and further aligned to compensate, 
within embodiment, for up to three bit cells of skew 
between any two data lines. The self-timing opera- 
tion has three parts. The first is to acquire bit 
synchronization; the second is byte/word align- 
ment; and the third is maintaining synchronization. 

In acquiring bit synchronization, the link takes 
itself from a completely untimed state into synchro- 
nous operation. Any previous condition on the STI 
interface or logic is disregarded with a complete 
logic reset. The bit synchronization process can be 
rapidly established, for example on the order of 
200 microseconds. The phase of the incoming data 
is manipulated on .a per line basis until the data 
valid window or bit interval is located. This is 
accomplished using a phase detector that locates 
an average edge position on the incoming data 
relative to the local clock. Using two phase detec- 
tors one can locate two consecutive edges on data 
and these two consecutive edges define the bit 
interval or data valid window. The data to be sam- 
pled by the local clock is th phase of the data 
located halfway betwe n the two dges of the data. 

Byte alignment takes place by manipulating th 
serial data stream in whole bit tim s to properly 
adjust th byte position relative to a deserializer 
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output. Referring now to Figure 4, word alignment 
takes place next by manipulating the deserializer 
data four bit int rvals at a time to ensure proper 
word synchronization on the STI int rface. A timing 
sequence allows proper bit, byte and word syn- s 
chronization. 

Synchronization maintenance occurs as part of 
the link operation in response to temperature and 
power supply variations. 

Referring now to Figure 2, which illustrates one io 
embodiment of a transmit serializer for a bit serial 
byte parallel interface used in the practice of the 
invention. Here a four byte wide data register 23 
receives parallel inputs 25 (bytes 0, 1, 2 and 3 
inputs shown here) and multiplexers 19 and 2:1 75 
selector 27 multiplex the register output to a one 
byte wide output of off chip driver 15 coupled to a 
self-timed interface bus. Data is clocked from the 
register 23 by divide-by-two logic 12 whose input 
is self-timed interface clock signal on line 27. Bit 20 
zero from bytes 0, 1, 2 and 3 are serialized and 
transmitted on link 0 of the self-timed interface, 
shown here. Bit 1 from bytes 0, 1, 2 and 3 will be 
transmitted on link 1 (not shown) and so on. 

To minimize the bandwidth requirements of the 25 
communication media the STI clock is one half the 
frequency of the transmitted data (baud) rate, i.e., a 
75 Mhz clock will be used for a 150 Mbit/S data 
rate. The clock will be generated from an STI 
oscillator source, this is done to decouple the sys- 30 
tern or host clock from the STI link. The data will 
be transmitted with both edges of the clock. 

Referring now to Figure 3, assuming a bit syn- 
chronization process as described in connection 
with 5 has been completed, byte synchronization 35 
starts by coupling the phase aligned data (now 2 
bits wide) into shift registers 33 whose outputs are 
coupled to multiplexer 35. Control inputs 37 to the 
multiplexer are used to deskew the particular data 
line from the other data lines by whole bit times. aq 
The deserializer data output for a particular data 
line is monitored for an expected timing pattern 
(e.g., X 0 1 0 where X is a don't care) to determine 
the proper order of the received data. If at any time 
a zero is detected in the bit 3 position, the mul- 45 
tiplexer is incremented thus moving the byte 
boundary by one bit time. This process is repeated 
until the proper byte boundary is located. The 
multiplexer control wraps around from a binary 3 to 
a binary 0 in case the correct position was in- 50 
correctly passed through the previous time, this 
function allows synchronization of data lines skew- 
ed by more than an entire bit time. 

During normal operation th physical macro 
will continuously monitor the incoming data to en- 55 
sure that the optimum clock sampling relationship 
xists. Small updates will b made to track tem- 
perature, power supply and data jitter. Thes up- 



dates will be seamless and transparent to the host 
logic. 

As will be appreciat d by those skilled in the 
art, any of a number of circuits, such as a digital 
phase lock loop, can be used as the self-timer 52 
to provide individual phase synchronization be- 
tween the clock and the data. However, in a pre- 
ferred embodiment of the invention, the novel edge 
detector disclosed in co-pending application Serial 
No. 262,081 filed 06/17/94, and assigned to the 
assignee of this application, and incorporated here- 
in by reference. 

Referring now to Figure 5. in this embodiment 
of the invention, the clock rate is the same as the 
data rate. The data edges that define a data win- 
dow are each detected independently of the other 
and the data is sampled at the midpoint between 
the edges when the edges have been aligned with 
the clock. The position of the edges of increment- 
ally separated phases of the input data stream are 
successively compared to the position of the rising 
and falling edges of the clock in order to locate the 
edges of the data stream with respect to both 
edges of the clock (e.g., the rising and falling 
edges). 

The data phase pairs are generated in this 
specific embodiment of the invention by three in- 
crementally selectable delay elements 80, 82, and 
84. For example, the elements 80 and 82 provide 
delays, respectively, in 1/1 0th and 1/5th bit time 
increments and element 84 provides fine incre- 
ments on the order of 1/20th of a bit time. The fine 
delay element 84 is separated into three groups to 
provide early edge detection, system data detec- 
tion, and late edge detection. An early guard band 
selector 86 successively selects one phase of the 
data stream to provide an "early" phase of the 
incrementally separated phases - one for the rising 
edge and one for the falling edge. Similarly, a late 
guard band selector 90 successively selects one 
phase of the data stream to provide a "late" phase 
of the incremental phases - again one for the rising 
edge and one for the falling edge. A selector 88 
selects incremental phases for the mid-cell system 
data position. 

A selected data phase is coupled as an input 
to master-slave RES-FES latch pairs 92, 94, and 
96. The rising edge data samples are clocked into 
the RES latches and the falling edge data samples 
are clocked into the FES latches. The outputs of 
the RES-FES latch pair 92 are connected to an 
early edge detector 98. Similarly, the outputs of the 
RES-FES latch pair 96 are coupled to a late edge 
det ctor 100. Th RES latch of pair 94 is coupled 
to the early edge detector 98 and the FES latch of 
pair 94 is coupled to the late edg detector 100. 

Each edg detector (98 and 100) outputs a 
"lead", a "lag" or a "do nothing" output which 
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indicates the location of a data edg with respect 
to the reference clock edge location. The output of 
each edge detector is coupled via a suitabl filter 
102 (i.e., a random walk filter), back to its respec- 
tive selector 86 and selector 90, respectively. The 5 
selectors shift the phase of the data coupled to the 
RES-FES latches in the direction indicated, or if 
"do nothing" is indicated, the phase of the data at 
that edge is not shifted. 

Data control logic 104 controls the system data w 
output by selecting the phase of the data that is 
halfway between the two data edges when the data 
edges are aligned with the reference clock. A 
phase of the data (Data 1 and Data 2) is outputted 
at each reference clock edge. 15 

In operation of a specific embodiment, at pow- 
er on the logic will automatically begin the bit 
synchronization process. A 16 microsecond timer 
is started, the bulk delays are reset to their mini- 
mum delay and a 16 bit counter running off the 20 
divided down clock is started. The edge detect 
circuitry will sample the incoming data with the 
received reference clock. The edge detector will 
output a "lead", a "lag" or a "do nothing" signal 
that indicates the data edge location relative to the 25 
reference clock. This signal is filtered by a Random 
Walk Filter (RWF) and fed-back to the selectors of 
their respective RES and FES circuits. The selec- 
tors shift the phase of the data into the RES and 
FES as indicated by the edge detector. Each edge 30 
detector operates independently of the other. Each 
will locate the transitions on data relative^ to the 
received (ref) clock by manipulating the incoming 
phase of the data into the edge detector as de- 
scribed above. The phase of the system data is 35 
controlled by the data control logic which selects 
the phase of the data halfway between the two 
edge detectors. In parallel with the bit synchroniza- 
tion process, the order of bits out of the de- 
serializer are manipulated to the correct order (see 40 
byte/word synchronization below). When the 16 
microsecond timer trips the algorithm resets a de- 
serializer error latch and restarts the 16 micro- 
second counter. The deserializer output is com- 
pared against the expected timing pattern (X 0 1 0 45 
where X is a don't care). A single miscompare on 
any cycle during the next 16 microseconds will set 
the deserializer error latch. When the 16 micro- 
second counter trips again the algorithm checks 
. the addresses of the EGB, LGB, and data selec- so 
tors, deserializer error latch. In order for a bit to 
end the initial bit synchronization search state, the 
deserializer output latch must have remained reset 
AND the all s lectors must be prop rly centered in 
their tracking range (centering nsures that adjust- 55 
ments can .be made to allow for th tracking of 
temp, and power supply variations after the initial 
bit synchronization process). If both conditions ar 



not met then the algorithm adds a bulk delay 
element, resets the 16 microsecond counter and 
th search proc ss begins once again. Each and 
every bit (data lin ) on the STI interface undergoes 
this process in parallel. Once an individual data line 
is determined to meet the initial bit synchronization 
criteria described above it is degated while the 
other lines continue to be adjusted. The bit syn- 
chronization process is complete once all bits are 
adjusted and meet the search criteria. The logic 
will not exit the bit synchronization mode until the 
16 bit counter trips. 

Finally word alignment takes place. Referring 
now to Figure 4, word alignment is established by 
manipulating the deserializer output bus four bits at 
a time until word synchronization is established. 
Note that the first register is shifted by four bit 
times relative to the second register. Four bit times 
is the maximum any data bit can be skewed rela- 
tive to another data bit (3 bit times on link + 1 bit 
time from phase alignment section). 

During normal operation the physical macro 
will continuously monitor the incoming data to en- 
sure that the optimum clock sampling relationship 
exists. Small updates will be made to track tem- 
perature, power supply and data jitter. These up- 
dates will be seamless and transparent to the host 
logic. Approximately 1/2 a bit time of delay will be 
needed to compensate for temperature and power 
supply variations to maintain proper synchroniza- 
tion. This added delay is in the fine delay elements 
section. There is also circuitry to monitor the posi- 
tion of the guard bands relative to the allowable 
range of operation. If a guard band reaches the end 
of its range, two cases exists: 1) a new bulk delay 
element is added and the fine delay elements are 
adjusted accordingly. Note this can cause sampling 
errors in the data. The circuitry that makes these 
on the fly bulk adjustments can be inhibited so no 
on the fly bulk delay adjustments are made during 
normal operation. The second case exists when 
one of the guard bands reaches the end of its 
range and the on the fly bulk delay adjustment is 
inhibited, the physical macro will signal the logical 
STI macro that a bit synchronization is required 
soon. The link should finish the immediate work 
and force the link into timing mode. 

Referring now to Figure 6, it shows an embodi- 
ment of the invention in which 64 nodes are con- 
nected in parallel by STI links. Each node is a 
processor in the network and there are four switch 
boards designated here as Rack-1, Rack-2, Rack-3 
and Rack-4. Groups of four processor nodes 80 are 
connected to non-blocking packet switch s 82. A 
self-timed interface of the type described above 
cross-connects th switches 82 to similar switch 
86. A self-timed interfac 88 connects each of th 
switches 86 and thus each of the nodes 80 in 
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parallel. 

Figure 7 shows how, using STI, a network can 
be scaled modularly, her to a network that inter- 
connects 128 nodes. Here, eight nodes with 16 
nodes each, labeled A and B, ar grouped into four s 
32 node units. Each rack labeled A or B cor- 
responds to a rack labeled Racks-1, 2, 3 or 4 in 
Figure 10. The racks A and B for each unit are 
connected by a self-timed interface to a 16 by 16 
switch 90 and the switches 90 for each of the units w 
are connected in a horizontal (H) and vertical (V) 
orientation by self-timed interfaces 92 as shown. In 
a similar manner, the number of interconnected 
nodes can be scaled upward to 512 and so on. 

Figure 8 shows, in simplified form, the network is 
of Figure 7 with a voltage controlled oscillator 93 
(VCO) on each switch board. The VCO 93 is used 
to generate a clock signal at the same frequency 
as the reference clock input. VCOs are common 
components readily available in the industry. A 20 
reference clock signal from one of the switch 
boards is selected with control lines during system 
initialization or upon detection of a clock fault. The 
selectable reference clock inputs can be any of the 
incoming STI clocks or a local fixed frequency 25 
oscillator. During system initialization one of the 
switch boards is designated the master, this 
board's VCO (VCO-M) will use the fixed frequency 
local oscillator as the reference input. The master's 
VCO output clock will be distributed to the entire 30 
board and will be the clock for the STI ports on that 
board as well. The STI ports will transmit the 
master clock to other boards. All other boards will 
be designated as slave boards (VCO-S). A slave 
board will select the inbound STI clocks from the 35 
master as the reference clock to the VCO. The 
VCO will output a clock at the same frequency as 
the inbound STI master clock to be used as the 
local clock for that board. This process continues 
through the entire network until all boards are op- 40 
erating at the same frequency referenced to the 
master clock board. This results in a highly fault 
tolerant clocking system. Any STI link failure that 
provides a slave VCO with a reference clock can 
easily be bypassed by choosing another inbound 45 
STI clock to provide the clock to the slave VCO. A 
master failure could be remedied by simply des- 
ignating a new master. Note that after initialization 
the entire network is operating at the same fre- 
quency. In order to guarantee synchronous data 50 
transfer from one board to another the phase of 
incoming data must be modified to account for the 
different physical distance between boards, cable 
manufacturing tol ranees, boards wiring and tol r- 
ances, temperatur and pow r supply differenc s, 55 
etc. The STI is used to detect and adjust the phase 
of the incoming data to prop rly synchroniz it to 
the local board clock. The STI will also track tem- 



perature and power supply variations to nsure that 
proper synchronization is maintained during normal 
system operation. W call this type of n twork or 
system "semi-synchronous," since it operates dif- 
ferently from conventional system clocking (syn- 
chronous) approaches, ft is neither a conventional 
asynchronous or synchronous system, but some- 
thing somewhere in the middle. 

Claims 

1. A network of computer processors intercon- 
nected in parallel comprising in combination: 

a plurality of computer processors each of 
which functions as a transmitting node and as 
a receiving node; 

a self-timed interface connecting each com- 
puter processor in said network to each other 
computer processor in said network; 
said self-timed interface including a transmit- 
ting node for transmitting a digital data and a 
clock signal and a receiving node for receiving 
said digital data and said clock signal, said 
transmitting node connected to said receiving 
node by a parallel data bus to individual lines 
of which a digital data stream is coupled by 
said clock signal at said transmitting node, 
said bus including a separate line for transmit- 
ting said clock signal, and said receiving node 
including means to phase align a digital data 
stream on each of said lines separately with 
respect to said clock signal. 

2. A network of computer processors as in claim 
1 further including a plurality of self-timed in- 
terface switching modules. 

3. A network of computer processors as in claim 
1 further including a plurality of self-timed in- 
terface switching modules, each said switching 
module providing an internal cross-connection 
between external communication ports and a 
group of processor nodes connected to each 
of said plurality of self-timed interface switch- 
ing modules and a self-timed interface con- 
necting said external communications ports of 
said plurality of self-timed interface switching 
modules. 

4. A network of computer processors as in claim 
3 wherein each switching module includes a 
first plurality of input-output switches to which 
all other switching modules in said network are 
connected, a second plurality of switch s to 
which said computer processors ar connect- 
ed. 
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5. A method for semi-synchronous transmission 
of data among a plurality of proc ssor units, 
comprising incombination: 

transmitting data among said processors as 
parallel streams of digital data coupled to sep- s 
arate lines of a parallel bus by a clock signal; 
transmitting said clock signal on a separate 
line of said bus; 

phase aligning each data stream separately 
with said transmitted clock signal; w 
synchronizing a local clock oscillator with said 
transmitted clock signal. 

6. A method for semi-synchronous transmission 
of data among a plurality of processor units as 
in claim 5 wherein said plurality of processor 
units includes three or more processor units 
and includes the further steps of: 
designating one of said processor units as a 
master processor unit and the remaining pro- 
cessor units as slave processor units; 
transmitting said clock signal of said master 
processor unit to each of said slave units. 

7. A method of semi-synchronous transmission of 25 
data among a plurality of processor units as in 
claim 5 wherein said master clock signal is 
transmitted over multiple paths. 
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